Efficient Reuse of Structured and Unstructured Resources for Ontology Population

نویسندگان

  • Chetana Gavankar
  • Ashish Kulkarni
  • Ganesh Ramakrishnan
چکیده

We study the problem of ontology population for a domain ontology and present solutions based on semi-automatic techniques. A domain ontology for an organization, often consists of classes whose instances are either specific to, or independent of the organization. E.g. in an academic domain ontology, classes like Professor, Department could be organization (university) specific, while Conference, Programming languages are organization independent. This distinction allows us to leverage data sources both — within the organization and those in the Internet — to extract entities and populate an ontology. We propose techniques that build on those for open domain IE. Together with user input, we show through comprehensive evaluation, how these semi-automatic techniques achieve high precision. We experimented with the academic domain and built an ontology comprising of over 220 classes. Intranet documents from five universities formed our organization specific corpora and we used open domain knowledge bases like Wikipedia, Linked Open Data, and web pages from the Internet as the organization independent data sources. The populated ontology that we built for one of the universities comprised of over 75,000 instances. We adhere to the semantic web standards and tools and make the resources available in the OWL format. These could be useful for applications such as information extraction, text annotation, and information retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology Based Software Storage Repository to Enhance Software Reuse

Software Reuse plays an important role in the world of software development. This leads to better development time and cost. Software Reuse is not used much because of unavailability of needed components, Unknown available component, Unstructured storage of relevant component, improper retrieval of components, unwillingness of developer etc. Unstructured Storage of Component is taken as a probl...

متن کامل

A Novel Software Reuse Method – An Ontological Approach

Software Industries develop various projects in various domain and store it in the disk as archives. These resources are not used to its fullest for the future reuse because of unstructured storage and retrieval methods. In this paper the Ontology based Storage and retrieval is proposed. The developer uses domain based ontology for understanding the domain of the project and the relevant semant...

متن کامل

Towards Ontological Structures Extraction from Folksonomies: An Efficient Fuzzy Clustering Approach

Folksonomies are one of the technologies of Web 2.0 that permit users to annotate resources on the Web. In this paper, the authors propose an integrated approach to extract ontological structures from unstructured and semi-structured resources. Our proposal overcomes limitations of existing approaches. It gives a formal, simple, and efficient solution to the tag clustering and disambiguation pr...

متن کامل

Automatic Annotation of Bioinformatics Workflows with Biomedical Ontologies

Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured ...

متن کامل

Similarity of Medical Cases in Health Care Using Cosine Similarity and Ontology

The increasing use of digital patient records in hospital saves time and reduces risks of wrong treatments caused by lack of information. Digital patient records also enable efficient spread and transfer of experience gained from diagnosis and treatment of individual patient which is now-a-days mostly manual (speaking with colleagues) and rarely aided by computerized system. Most of the content...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014